Idioms in Context: The IDIX Corpus

نویسندگان

  • Caroline Sporleder
  • Linlin Li
  • Philip Gorinski
  • Xaver Koch
چکیده

Idioms and other figuratively used expressions pose considerable problems to natural language processing applications because they are very frequent and often behave idiosyncratically. Consequently, there has been much research on the automatic detection and extraction of idiomatic expressions. Most studies focus on type-based idiom detection, i.e., distinguishing whether a given expression can (potentially) be used idiomatically. However, many expressions such as break the ice can have both literal and non-literal readings and need to be disambiguated in a given context (token-based detection). So far relatively few approaches have attempted context-based idiom detection. One reason for this may be that few annotated resources are available that disambiguate expressions in context. With the IDIX corpus, we aim to address this. IDIX is available as an add-on to the BNC and disambiguates different usages of a subset of idioms. We believe that this resource will be useful both for linguistic and computational linguistic studies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Context on the learning and Retention of Idioms

The purpose of the present study was to investigate the effect of context on learning idioms among 60 Iranian female advanced English learners. To this end, the researcher assigned the participants to two experimental groups and one control group: Group 1 (first experimental group, the extended-context group), Group 2 (second experimental group, the limited-context group) and Group 3 (control g...

متن کامل

Construction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features

Some phrases can be interpreted either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged natural language processing (NLP). To this end, we have constructed an idiom corpus for Japanese. This paper reports on the corpus and the results of an idiom identification experiment using the corpus. The corpus targets 146 amb...

متن کامل

Drawing a Line between Literal and Idiomatic Meanings Based on Supervised WSD

Hashimoto, Chikara and Kawahara, Daisuke. 2008. Drawing a Line between Literal and Idiomatic Meanings Based on Supervised WSD. Linguistic Research 25(2), 105-123. Some phrases can be interpreted either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged natural language processing (NLP). To this end, we have constructe...

متن کامل

From etymology to modern phraseology: A corpus-based study of structural variants of Chinese idioms in naturally-occurring contexts

Compared with recent developments in English corpus lexicology and phraseology, the study of Chinese lexicology and phraseology still remains at a level similar to what Fernando has described as quasi-lexicography (Fernando, 1996: 10-11) in the study of English idioms, where explanations provided in idiom dictionaries are rather prescriptive and static than descriptive and dynamic: a typical te...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010